We explore the effect of COVID-19 pandemic on crime rates across different cities in the US to get a glimpse into the future of crime in these unprecedented circumstances.
Our goal is to use COVID and Crime data to make future predictions on crime rates in this pandemic. We also look at the crime rates before and after the pandemic to gain insight into how quarantine and self-isolation measures influence human behaviour leading to an increase in specific types of crime.
Click on the location markers on the left to view their analysis !
Some text desccription
Some text desccription
This is text link
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
---
title: "COVID-19 and US Crime"
author: ""
output:
flexdashboard::flex_dashboard:
theme: flatly
orientation: rows
social: menu
source: embed
---
```{r setup, include=FALSE}
library(ggplot2)
library(plotly)
library(plyr)
library(flexdashboard)
library(RSocrata)
library(tidyverse)
library(tsbox) # transform data into time series
library(xts)
library(COVID19) # to get data about covid 19
library(forecast) #ariRI model
library(vars) #VAR and Causality
library(dygraphs)
library(leaflet)
library(htmlwidgets)
#[INCOMPLETE] Graphs in Chicago have been converted here to Plotly HTML Widgets
# Make some noisily increasing data [Testing Purposes]
set.seed(955)
dat <- data.frame(cond = rep(c("A", "B"), each=10),
xvar = 1:20 + rnorm(20,sd=3),
yvar = 1:20 + rnorm(20,sd=3))
#Load Chicago Data
covid19_CH <- covid19("USA", level = 3) %>%
# this cook county contains chicago
filter(administrative_area_level_3 == "Cook",
administrative_area_level_2 == "Illinois" ) %>%
# filter out days when confirmed is zero or one
# becasue when it was 2 for a very long time
filter(confirmed > 2)
```
Overview {.storyboard}
=======================================================================
### Introduction
```{r}
Location <- c("Providence ","Los Angeles","Chicago","Boston","Seattle","Atlanta","Philadelphia" )
lat <- c(41.8240,33.82377,41.78798,42.361145,47.714965,33.753746, 39.952583)
lng <- c( -71.4128,-118.2668,-87.7738,-71.057083,-122.127166 ,-84.386330,-75.165222)
df <- data.frame(Location, lat,lng)
jsCode <- paste0('
function(el, x) {
var marker = document.getElementsByClassName("leaflet-marker-icon leaflet-zoom-animated leaflet-interactive");
marker[0].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#providence");};
marker[1].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#los-angeles");};
marker[2].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#chicago");};
marker[3].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#boston");};
marker[4].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#seattle");};
marker[5].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#atlanta");};
marker[6].onclick=function(){window.open("http://individual.utoronto.ca/nayan/datafest.html#philadelphia");};
}
')
m <- leaflet() %>%
addTiles() %>% # Add default OpenStreetMap map tiles
addMarkers(data=df,)%>%
onRender(jsCode)
m # Print the map
```
***
We explore the effect of COVID-19 pandemic on crime rates across different cities in the US to get a glimpse into the future of crime in these unprecedented circumstances.
Our goal is to use COVID and Crime data to make future predictions on crime rates in this pandemic. We also look at the crime rates before and after the pandemic to gain insight into how quarantine and self-isolation measures influence human behaviour leading to an increase in specific types of crime.
Click on the location markers on the left to view their analysis !
### Key Results
```{r}
```
***
Some text desccription
### Coming Soon
```{r}
```
***
Some text desccription
Chicago
=======================================================================
Row{data-height=300}
-------------------------------------
### Impulse response function (criminal damage)
```{r get the data for this city}
if (exists("chicago")) is.data.frame(get("chicago")) else chicago <- RSocrata::read.socrata(
"https://data.cityofchicago.org/resource/ijzp-q8t2.csv?$where=year >= 2014",
app_token = "hPU78MH7zKApdpUv4OVCInPOQ")
```
```{r}
# add date
chicago <- chicago %>%
mutate(Date = substr(date, start = 1, stop = 10)) %>%
mutate(y_month = substr(date, start = 1, stop = 7)) %>%
mutate(month = substr(date, start = 6, stop = 7))
# summary of all crime
chicago_summary <- chicago %>%
group_by(primary_type) %>%
summarise(number_of_crime = n()) %>%
arrange(desc(number_of_crime))
```
```{r extract cases}
# extract top 5 crime
top5crime <- chicago %>%
filter(primary_type %in% head(chicago_summary$primary_type, 5)) %>%
group_by(Date, primary_type) %>%
tally() %>%
spread(primary_type, n)
# rename columns
colnames(top5crime) <- c('time',
"assault",
"battery",
"criminal",
'deceptive',
"theft")
top5crime <- na.omit(top5crime)
# create time series
top5crime_xts <- ts_xts(top5crime[,1:2])
for (i in (3:ncol(top5crime))){
temp_xts <- ts_xts(top5crime[, c(1,i)])
top5crime_xts <- merge(top5crime_xts, temp_xts)
}
# extract difference, change per day
top5crime_diff <- na.omit(diff(top5crime_xts))
```
```{r covid 19 related exploration, message=F, warning=F}
# extract for tranforming into time series data
ts_CH <- covid19_CH %>%
dplyr::select(date, confirmed) %>%
ts_xts()
# try first log difference
ts_diff_CH <- na.omit(diff(ts_CH))
covid19_CH_diff <- data.frame(diff(covid19_CH$confirmed))
colnames(covid19_CH_diff)[1] = "confirmed"
covid19_CH_diff$date = covid19_CH$date[2:length(covid19_CH$date)]
# time as integer
covid19_CH_diff$timeInt = as.numeric(covid19_CH_diff$date)
# make a copy to avoid perfect collinearity
covid19_CH_diff$timeIid = covid19_CH_diff$timeInt
# GAMM model
# 50 too overfit. 15 looks decent
gamCH <- gamm4::gamm4(confirmed ~ s(timeInt, k=50), random = ~(1|timeIid),
data=covid19_CH_diff, family=poisson(link='log'))
toPredict = data.frame(time = seq(covid19_CH_diff$date[1],
covid19_CH_diff$date[length(covid19_CH_diff$date)],
by = '1 day'))
toPredict$timeInt = as.numeric(toPredict$time)
# obtain forecast
forecast <- data.frame(exp(do.call(cbind, mgcv::predict.gam(gamCH$gam, toPredict, se.fit=TRUE))))
# access residuals
CH_res <- data.frame(covid19_CH_diff$confirmed - forecast$fit)
# transform into time series
CH_res$time = covid19_CH_diff$date
colnames(CH_res)[1] = "residuals"
col_order <- c("time", "residuals")
CH_res <- CH_res[, col_order]
CH_res_ts <- ts_xts(CH_res)
```
```{r top 5 crime VAR}
# specify common time range
# start from when covid was a thing
# end on May 25, to avoid effect of protests related to George Floyid.
common_time <- seq.Date(start(CH_res_ts), as.Date("2020-05-25"), by = "day")
# combine time series of crime and covid
combined_diff <- merge(top5crime_diff[paste(common_time[1],
common_time[length(common_time)],
sep = "/")],
CH_res_ts[paste(common_time[1],
common_time[length(common_time)],
sep = "/")])
```
```{r construct var}
optimal_assault <- VARselect(na.omit(combined_diff)[,c(1,6)], type = 'none', lag.max = 10)
optimal_battery <- VARselect(na.omit(combined_diff)[,c(2,6)], type = 'none', lag.max = 10)
optimal_criminal <- VARselect(na.omit(combined_diff)[,c(3,6)], type = 'none', lag.max = 10)
optimal_deceptive <- VARselect(na.omit(combined_diff)[,c(4,6)], type = 'none', lag.max = 10)
optimal_theft <- VARselect(na.omit(combined_diff)[,c(5,6)], type = 'none', lag.max = 10)
VAR_assault <- VAR(y=as.ts(na.omit(combined_diff)[,c(1,6)]), p=optimal_assault$selection[1])
VAR_battery <- VAR(y=as.ts(na.omit(combined_diff)[,c(2,6)]), p=optimal_battery$selection[1])
VAR_criminal <- VAR(y=as.ts(na.omit(combined_diff)[,c(3,6)]),
p=optimal_criminal$selection[1])
VAR_deceptive <- VAR(y=as.ts(na.omit(combined_diff)[,c(4,6)]),
p=optimal_deceptive$selection[1])
VAR_theft <- VAR(y=as.ts(na.omit(combined_diff)[,c(5,6)]), p=optimal_theft$selection[1])
```
```{r, warning=F, message=F}
vehicle <- chicago %>%
filter(primary_type == 'MOTOR VEHICLE THEFT')%>%
group_by(date, primary_type) %>%
tally() %>%
spread(primary_type, n)
colnames(vehicle) <- c('time', 'vehicle')
vehicle_xts <- ts_xts(na.omit(vehicle)[,1:2])
vehicle_diff <- na.omit(diff(vehicle_xts))
combined_diff2 <- merge(vehicle_diff[paste(common_time[1],
common_time[length(common_time)],
sep = "/")],
CH_res_ts[paste(common_time[1],
common_time[length(common_time)],
sep = "/")])
optimal_vehicle <- VARselect(na.omit(combined_diff2)[,c(1,2)], type = 'none', lag.max = 10)
VAR_vehicle <- VAR(y=as.ts(na.omit(combined_diff2)[,c(1,2)]), p=optimal_vehicle$selection[1])
```
```{r irf}
# use car theft and criminal damage
par(mfrow = c(1,2))
lags = c(1:25)
# criminal damange
# only significant one is from covid to crime
irf_criminal_2 <- irf(VAR_criminal,
impulse = "residuals",
response = "criminal",
n.ahead = 24,
ortho = F)
# ggplot version of it.
irf_criminal_2_gg <- data.frame(
irf_criminal_2$irf$residuals[,1],
irf_criminal_2$Lower$residuals[,1],
irf_criminal_2$Upper$residuals[,1]
)
colnames(irf_criminal_2_gg) <- c("mean", "lower", "upper")
i1 <- ggplot(irf_criminal_2_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more criminal damage cases per day there will be
after 1 confirmed COVID-19 case") +
xlab("Number of days after 1 COVID-19 case")+
ylab("Number of criminal damange per day")
ggplotly(i1)
```
### Impulse response function (vehicle theft)
```{r}
# vehicle theft
# only significant one is from covid to crime
irf_vehicle_2 <- irf(VAR_vehicle,
impulse = "residuals",
response = "vehicle",
n.ahead = 24)
# ggplot version of it
irf_vehicle_2_gg <- data.frame(
irf_vehicle_2$irf$residuals[,1],
irf_vehicle_2$Lower$residuals[,1],
irf_vehicle_2$Upper$residuals[,1]
)
colnames(irf_vehicle_2_gg) <- c("mean", "lower", "upper")
i2 <- ggplot(irf_vehicle_2_gg, aes(x = lags)) +
geom_line(aes(y = mean), color = "black") +
geom_hline(yintercept = 0, color = "blue") +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "lightblue", alpha = 0.5) +
theme_classic() +
ggtitle("How many more vehicle theft cases per day there will be
after 1 confirmed COVID-19 case") +
xlab("Number of days after 1 COVID-19 case")+
ylab("Number of vehicle theft per day")
ggplotly(i2)
```
Row {data-height=400 .tabset .tabset-fade }
-------------------------------------
### Chicago Crime Summary
```{r}
# add date
chicago <- chicago %>%
mutate(Date = substr(date, start = 1, stop = 10)) %>%
mutate(y_month = substr(date, start = 1, stop = 7)) %>%
mutate(month = substr(date, start = 6, stop = 7))
# summary of all crime
chicago_summary <- chicago %>%
group_by(primary_type) %>%
summarise(number_of_crime = n()) %>%
arrange(desc(number_of_crime))
# looks life theft is seeing sharp drop
# year to year comparison
plt <- chicago %>%
dplyr::select(y_month, month, primary_type, year) %>%
filter(primary_type %in% chicago_summary$primary_type[1:5], y_month != "2020-06") %>%
count(year, month, primary_type) %>%
na.omit()%>% ggplot(aes(x=month, y=n, group = year, color = as.character(year))) + geom_line() + facet_free(~primary_type,scales = "free", space = "free")+xlab("Month") +
ylab("Cases") + theme(legend.title = element_blank())
ggplotly(plt)
```
### VAR forecast for criminal damage
```{r}
# criminal damange and vehicle theft
forecast_criminal <- forecast(VAR_criminal)
forecast_vehicle <- forecast(VAR_vehicle)
forecast_assault <- forecast(VAR_assault)
forecast_battery <- forecast(VAR_battery)
forecast_deceptive <- forecast(VAR_deceptive)
forecast_theft <- forecast(VAR_theft)
# only impulse from covid are significant
# only forecast on crime are being helped
varf1 <- autoplot(forecast_criminal$forecast$criminal) +
ggtitle("Prediction on how many more criminal damage cases
compared to yesterday") +
theme_classic() +
ylab("Day-to-day change") +
xlab(paste("Numebr of days since", common_time[1]))
ggplotly(varf1)
```
### VAR forecast for vehicle theft
```{r}
varf2 <- autoplot(forecast_vehicle$forecast$vehicle) +
ggtitle("Prediction on how many more vehicle theft cases
compared to yesterday") +
theme_classic() +
ylab("Day-to-day change") +
xlab(paste("Numebr of days since", common_time[1]))
ggplotly(varf2)
```
### VAR forecast for assault
```{r}
varf3 <- autoplot(forecast_assault$forecast$assault) +
ggtitle("Prediction on how many more assault cases
compared to yesterday") +
theme_classic() +
ylab("Day-to-day change") +
xlab(paste("Numebr of days since", common_time[1]))
ggplotly(varf3)
```
### VAR forecast for battery
```{r}
varf4 <- autoplot(forecast_battery$forecast$battery) +
ggtitle("Prediction on how many more battery cases
compared to yesterday") +
theme_classic() +
ylab("Day-to-day change") +
xlab(paste("Numebr of days since", common_time[1]))
ggplotly(varf4)
```
### VAR forecast for deceptive
```{r}
varf5 <- autoplot(forecast_deceptive$forecast$deceptive) +
ggtitle("Prediction on how many more deceptive practice cases
compared to yesterday") +
theme_classic() +
ylab("Day-to-day change") +
xlab(paste("Numebr of days since", common_time[1]))
ggplotly(varf5)
```
### VAR forecast for theft
```{r}
varf6 <- autoplot(forecast_theft$forecast$theft) +
ggtitle("Prediction on how many more theft cases
compared to yesterday") +
theme_classic() +
ylab("Day-to-day change") +
xlab(paste("Numebr of days since", common_time[1]))
ggplotly(varf6)
```
Row {data-height=300}
-----------------------------------------------------------------------
### Daily confirmed cases of COVID19 in Chicago
```{r}
# if (exists("chicago")) is.data.frame(get("chicago")) else chicago <- RSocrata::read.socrata(
# "https://data.cityofchicago.org/resource/ijzp-q8t2.csv?$where=year >= 2014",
# app_token = "hPU78MH7zKApdpUv4OVCInPOQ")
#
#
# # add date
# chicago <- chicago %>%
# mutate(Date = substr(date, start = 1, stop = 10)) %>%
# mutate(y_month = substr(date, start = 1, stop = 7)) %>%
# mutate(month = substr(date, start = 6, stop = 7))
#
# # summary of all crime
# chicago_summary <- chicago %>%
# group_by(primary_type) %>%
# summarise(number_of_crime = n()) %>%
# arrange(desc(number_of_crime))
#
# # looks life theft is seeing sharp drop
#
# # year to year comparison
# plt <- chicago %>%
# dplyr::select(y_month, month, primary_type, year) %>%
# filter(primary_type %in% chicago_summary$primary_type[1:5]) %>%
# count(year, month, primary_type) %>%
# na.omit()%>% ggplot(aes(x=month, y=n, group = year, color = as.character(year))) + geom_line() + facet_free(~primary_type,scales = "free", space = "free")+xlab("Month") +
# ylab("Cases") + theme(legend.title = element_blank())
#
# ggplotly(plt)
#TEST CHUNK UNCOMMENT TO REDUCE FILE RUN TIME [Design]
# n <- 20
# x1 <- rnorm(n); x2 <- rnorm(n)
# y1 <- 2 * x1 + rnorm(n)
# y2 <- 3 * x2 + (2 + rnorm(n))
# A <- as.factor(rep(c(1, 2), each = n))
# df <- data.frame(x = c(x1, x2), y = c(y1, y2), A = A)
# fm <- lm(y ~ x + A, data = df)
#
# p <- ggplot(data = cbind(df, pred = predict(fm)), aes(x = x, y = y, color = A))
# p <- p + geom_point() + geom_line(aes(y = pred))
# ggplotly(p)
dygraph(ts_diff_CH,
main = "Daily confirmed cases of
COVID-19 in Chicago")%>%
dySeries("cd7b965f", label = "Chicago")%>%
dyLegend(show = "always", hideOnMouseOut = FALSE)
```
### Summary
This is text [link](http://www.example.com)
- one
- two
- three
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Providence
=======================================================================
Boston
=======================================================================
Philadelphia
=======================================================================
Los Angeles
=======================================================================
Atlanta
=======================================================================
Seattle
=======================================================================